Model Selection

CLIP Architecture

# CLIP Architecture

Vit Large Patch14 Clip 224.laion2b

Vision Transformer model based on CLIP architecture, specialized in image feature extraction

Image Classification

Vit Large Patch14 Clip 224.datacompxl

A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, released by the LAION organization.

Image Classification

Vit Base Patch16 Clip 224.laion2b

Vision Transformer model based on CLIP architecture, containing only the image encoder part, suitable for image feature extraction tasks

Image Classification

Vit Base Patch16 Plus Clip 240.laion400m E31

A vision-language dual-purpose model trained on the LAION-400M dataset, supporting zero-shot image classification tasks

Image Classification

Resnet50x4 Clip.openai

ResNet50x4 vision-language model based on CLIP architecture, supporting zero-shot image classification tasks

Chinese Clip Vit Base Patch16

Chinese CLIP model based on ViT architecture, supporting multimodal understanding of images and text

CLIP ViT B 16 CommonPool.L.clip S1b B8k

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks

CLIP ViT B 32 DataComp.M S128m B4k

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks, trained on the DataComp.M dataset

CLIP ViT B 32 CommonPool.M.laion S128m B4k

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks

CLIP ViT B 32 CommonPool.S S13m B4k

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks

Eva02 Base Patch16 Clip 224.merged2b S8b B131k

CLIP model based on EVA02 architecture, suitable for zero-shot image classification tasks

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase